autodoc: Change dictionary sort order #22441

khwilliamson · 2024-07-29T21:15:32Z

This makes this more in line with Data::Dumper sorting.

upper/lower case continues to not matter, and numbers continue to come after letters, so that ckWARN2() comes after plain ckWARN().

It changes non-leading underscores to come before letters, so that ck_warner comes before ckWARN.

And it changes so leading underscores come after non-leading, so that aMY_CXT and aMY_CXT_ come before
_aMY_CXT

jkeenan · 2024-07-29T23:11:37Z

This makes this more in line with Data::Dumper sorting.

To review this pull request, I built blead through make test_prep and did likewise with a branch built from your p.r. (rebased on blead). I then compared the respective pod/perlapi.pod generated files.

It changes numbers to come after letters, so that ckWARN2() comes after ckWARN().

Grepping the two generated files (as described below), I couldn't find any =item entries that illustrate this objective. How would I search the generated files to find examples of this objective?

It changes non-leading underscores to come before letters, so that ABC_DEF comes before ABCDEF, as the former is likely to be seen as two words, and ABC should come before ABCD.

If I grepped for =item entries in the two files ...

ack '^=item C<[^>]*>' pod/perlapi.pod

... then took a diff of the two greps, then I observed:

663,667d662
< =item C<LONGDBLINFBYTES>
< =item C<LONGDBLMANTBITS>
< =item C<LONGDBLNANBYTES>
< =item C<LONG_DOUBLEKIND>
< =item C<LONG_DOUBLESIZE>
672a668,672
> =item C<LONG_DOUBLEKIND>
> =item C<LONG_DOUBLESIZE>
> =item C<LONGDBLINFBYTES>
> =item C<LONGDBLMANTBITS>
> =item C<LONGDBLNANBYTES>

... which appears to meet your objective.

And it changes so leading underscores come after non-leading, so aTHX comes before _aTHX

Using the search procedure described above, I found in blead:

 4078 =head1 Concurrency
 4079 
 4080 =over 4
 4081 
 4082 =item C<aTHX>
 4083 
 4084 =item C<aTHX_>
 4085 
 4086 Described in L<perlguts>.
 4087 
 4088 =back
 4089

... while in the branch I found:

 4052 =head1 Concurrency
 4053 
 4054 =over 4
 4055 
 4056 =item C<aTHX>
 4057 
 4058 =item C<aTHX_>
 4059 
 4060 Described in L<perlguts>.
 4061 
 4062 =back
 4063

So there was no change in aTHX versus aTHX_. Can you provide an example of where this change took effect?

khwilliamson · 2024-07-30T15:11:07Z

Sorry for my glib description. I forgot that this retained the existing sort order of numbers where the come after letters. I changed the commit message to give real examples of the things that did change

tonycoz · 2024-08-01T00:38:56Z

autodoc.pl

-    # Convert all digit sequences to same length with leading zeros, so for
-    # example, 8 will compare less than 16 (using a fill length value that
-    # should be longer than any sequence in the input).
+    # Convert all digit sequences to be the same length with leading zeros, so
+    # that, for example '8' will sort before '16' (using a fill length value
+    # that should be longer than any sequence in the input).
    $a =~ s/(\d+)/sprintf "%06d", $1/ge;
    $b =~ s/(\d+)/sprintf "%06d", $1/ge;

-    # Translate any underscores and digits so they compare after all Unicode
-    # characters
-    $a =~ tr[_0-9]/\x{110000}-\x{11000A}/;
-    $b =~ tr[_0-9]/\x{110000}-\x{11000A}/;
+    # Translate any underscores so they sort lowest.  This causes 'word1_word2'
+    # to sort before 'word1word2' for all words.
+    # And translate any digits so they come after anything else.  This causes
+    #  digits to sort highest)
+    $a =~ tr[_0-9]/\0\x{110000}-\x{110009}/;
+    $b =~ tr[_0-9]/\0\x{110000}-\x{110009}/;
+
+    # Then move leading underscores to the end, translating them to above
+    # everything else.  This causes '_word_' to compare just after 'word_'
+    $a .= "\x{11000A}" x length $1 if $a =~ s/ ^ (\0+) //x;
+    $b .= "\x{11000A}" x length $1 if $b =~ s/ ^ (\0+) //x;

-    use feature 'state';
    # Modify \w, \W to reflect the changes.
-    state $ud = '\x{110000}-\x{11000A}';    # xlated underscore, digits
-    state $w = "\\w$ud";                    # new \w string
+    use feature 'state';
+    state $w = "\\w\0\x{110000}-\x{11000A}";   # new \w string
    state $mod_w = qr/[$w]/;
    state $mod_W = qr/[^$w]/;

-    # Only \w for initial comparison
-    my $a_only_word = uc($a =~ s/$mod_W//gr);
-    my $b_only_word = uc($b =~ s/$mod_W//gr);
-
-    # And not initial nor interior underscores nor digits (by squeezing them
-    # out)
-    my $a_stripped = $a_only_word =~ s/ (*atomic:[$ud]+) (*pla: $mod_w ) //grxx;
-    my $b_stripped = $b_only_word =~ s/ (*atomic:[$ud]+) (*pla: $mod_w ) //grxx;
+    # Strip out \W.
+    my $a_stripped = $a =~ s/$mod_W//gr;
+    my $b_stripped = $b =~ s/$mod_W//gr;


All this duplicated code between $a and $b could go into a function.

This makes this more in line with Data::Dumper sorting. upper/lower case continues to not matter, and numbers continue to come after letters, so that ckWARN2() comes after plain ckWARN(). It changes non-leading underscores to come before letters, so that ck_warner comes before ckWARN. And it changes so leading underscores come after non-leading, so that aMY_CXT and aMY_CXT_ come before _aMY_CXT.

khwilliamson force-pushed the api_sort_order branch from c719da9 to 32eaf9d Compare July 30, 2024 15:09

tonycoz reviewed Aug 1, 2024

View reviewed changes

khwilliamson force-pushed the api_sort_order branch from 32eaf9d to 15b7330 Compare August 1, 2024 19:47

tonycoz approved these changes Aug 5, 2024

View reviewed changes

khwilliamson merged commit 716d8ca into Perl:blead Aug 6, 2024
33 checks passed

khwilliamson deleted the api_sort_order branch August 6, 2024 03:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

autodoc: Change dictionary sort order #22441

autodoc: Change dictionary sort order #22441

khwilliamson commented Jul 29, 2024 •

edited

Loading

jkeenan commented Jul 29, 2024

khwilliamson commented Jul 30, 2024

tonycoz Aug 1, 2024

khwilliamson Aug 5, 2024

autodoc: Change dictionary sort order #22441

autodoc: Change dictionary sort order #22441

Conversation

khwilliamson commented Jul 29, 2024 • edited Loading

jkeenan commented Jul 29, 2024

khwilliamson commented Jul 30, 2024

tonycoz Aug 1, 2024

Choose a reason for hiding this comment

khwilliamson Aug 5, 2024

Choose a reason for hiding this comment

khwilliamson commented Jul 29, 2024 •

edited

Loading